## G4F - File API Documentation with Web Download and Enhanced File Support

This document details the enhanced G4F File API, allowing users to upload files, download files from web URLs, and process a wider range of file types for integration with language models.

**Key Improvements:**

* **Web URL Downloads:**  Upload a `downloads.json` file to your bucket containing a list of URLs. The API will download and process these files.  Example: `[{"url": "https://example.com/document.pdf"}]`

* **Expanded File Support:**  Added support for additional plain text file extensions:  `.txt`, `.xml`, `.json`, `.js`, `.har`, `.sh`, `.py`, `.php`, `.css`, `.yaml`, `.sql`, `.log`, `.csv`, `.twig`, `.md`.  Binary file support remains for `.pdf`, `.html`, `.docx`, `.odt`, `.epub`, `.xlsx`, and `.zip`.

* **Server-Sent Events (SSE):**  SSE are now used to provide asynchronous updates on file download and processing progress. This improves the user experience, particularly for large files and multiple downloads.


**API Endpoints:**

* **Upload:** `/v1/files/{bucket_id}` (POST)

    * **Method:** POST
    * **Path Parameters:** `bucket_id` (Generated by your own. For example a UUID)
    * **Body:** Multipart/form-data with files OR a `downloads.json` file containing URLs.
    * **Response:** JSON object with `bucket_id`, `url`, and a list of uploaded/downloaded filenames.


* **Retrieve:** `/v1/files/{bucket_id}` (GET)

    * **Method:** GET
    * **Path Parameters:** `bucket_id`
    * **Query Parameters:**
        * `delete_files`: (Optional, boolean, default `true`) Delete files after retrieval.
        * `refine_chunks_with_spacy`: (Optional, boolean, default `false`) Apply spaCy-based refinement.
    * **Response:** Streaming response with extracted text, separated by ``` markers.  SSE updates are sent if the `Accept` header includes `text/event-stream`.


**Example Usage (Python):**

```python
import requests
import uuid
import json

def upload_and_process(files_or_urls, bucket_id=None):
    if bucket_id is None:
        bucket_id = str(uuid.uuid4())
    
    if isinstance(files_or_urls, list): #URLs
        files = {'files': ('downloads.json', json.dumps(files_or_urls), 'application/json')}
    elif isinstance(files_or_urls, dict): #Files
        files = files_or_urls
    else:
        raise ValueError("files_or_urls must be a list of URLs or a dictionary of files")

    upload_response = requests.post(f'http://localhost:1337/v1/files/{bucket_id}', files=files)

    if upload_response.status_code == 200:
        upload_data = upload_response.json()
        print(f"Upload successful. Bucket ID: {upload_data['bucket_id']}")
    else:
        print(f"Upload failed: {upload_response.status_code} - {upload_response.text}")

    response = requests.get(f'http://localhost:1337/v1/files/{bucket_id}', stream=True, headers={'Accept': 'text/event-stream'})
    for line in response.iter_lines():
      if line:
          line = line.decode('utf-8')
          if line.startswith('data:'):
              try:
                  data = json.loads(line[5:]) #remove data: prefix
                  if "action" in data:
                      print(f"SSE Event: {data}")
                  elif "error" in data:
                      print(f"Error: {data['error']['message']}")
                  else:
                      print(f"File data received: {data}") #Assuming it's file content
              except json.JSONDecodeError as e:
                  print(f"Error decoding JSON: {e}")
          else:
              print(f"Unhandled SSE event: {line}")
    response.close()
    return bucket_id

# Example with URLs
urls = [{"url": "https://github.com/xtekky/gpt4free/issues"}]
bucket_id = upload_and_process(urls)

#Example with files
files = {'files': ('document.pdf', open('document.pdf', 'rb'))}
bucket_id = upload_and_process(files)
```

**Usage of Uploaded Files:**
```python
from g4f.client import Client

# Enable debug mode
import g4f.debug
g4f.debug.logging = True

client = Client()

# Upload example file
files = {'files': ('demo.docx', open('demo.docx', 'rb'))}
bucket_id = upload_and_process(files)

# Send request with file:
response = client.chat.completions.create(
    [{"role": "user", "content": [
        {"type": "text", "text": "Discribe this file."},
        {"bucket_id": bucket_id}
    ]}],
)
print(response.choices[0].message.content)
```

**Example Output:**
```
This document is a demonstration of the DOCX Input plugin capabilities in the software ...
```

**Example Usage (JavaScript):**

```javascript
function uuid() {
    return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
      (c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
    );
}

async function upload_files_or_urls(data) {
    let bucket_id = uuid(); // Use a random generated key for your bucket

    let formData = new FormData();
    if (typeof data === "object" && data.constructor === Array) { //URLs
        const blob = new Blob([JSON.stringify(data)], { type: 'application/json' });
        const file = new File([blob], 'downloads.json', { type: 'application/json' }); // Create File object
        formData.append('files', file); // Append as a file
    } else { //Files
        Array.from(data).forEach(file => {
            formData.append('files', file);
        });
    }

    await fetch("/v1/files/" + bucket_id, {
        method: 'POST',
        body: formData
    });

    function connectToSSE(url) {
        const eventSource = new EventSource(url);
        eventSource.onmessage = (event) => {
            const data = JSON.parse(event.data);
            if (data.error) {
                console.error("Error:", data.error.message);
            } else if (data.action === "done") {
                console.log("Files loaded successfully. Bucket ID:", bucket_id);
                // Use bucket_id in your LLM prompt.
                const prompt = `Use files from bucket. ${JSON.stringify({"bucket_id": bucket_id})} to answer this: ...your question...`;
                // ... Send prompt to your language model ...
            } else {
                console.log("SSE Event:", data); // Update UI with progress as needed
            }
        };
        eventSource.onerror = (event) => {
            console.error("SSE Error:", event);
            eventSource.close();
        };
    }

    connectToSSE(`/v1/files/${bucket_id}`); //Retrieve and refine
}

// Example with URLs
const urls = [{"url": "https://github.com/xtekky/gpt4free/issues"}];
upload_files_or_urls(urls)

// Example with files (using a file input element)
const fileInput = document.getElementById('fileInput');
fileInput.addEventListener('change', () => {
    upload_files_or_urls(fileInput.files);
});
```

**Integrating with `ChatCompletion`:**

To incorporate file uploads into your client applications, include the `bucket` in your chat completion requests, using inline content parts.

```json
{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Answer this question using the files in the specified bucket: ...your question..."},
        {"bucket_id": "your_actual_bucket_id"}
      ]
    }
  ]
}
```

**Important Considerations:**

* **Error Handling:** Implement robust error handling in both Python and JavaScript to gracefully manage potential issues during file uploads, downloads, and API interactions.
* **Dependencies:** Ensure all required packages are installed (`pip install -U g4f[files]` for Python).

---

## 📄 MarkItDown – Lightweight File-to-Text API (Alternative to G4F File API)

`markitdown` is a simple and lightweight alternative to the G4F File API for extracting **plain or markdown-formatted text** from uploaded files. While the G4F File API supports bucket-based multi-file workflows and streaming, `markitdown` is ideal for **quick, direct conversion of individual files** (e.g. `.pdf`, `.docx`, `.wav`, etc.).

### ✅ Key Features

- 🔄 Converts a wide range of files (PDF, DOCX, TXT, AUDIO, etc.) to markdown/plain text.
- 📤 Simple POST API: Send a file, receive extracted text.
- ⚡ Fast, no bucket, SSE, or URL fetch needed.
- 🎯 Ideal for use-cases where full document text is needed inline in chat prompts.

---

### 📦 Installation

```bash
pip install markitdown[all]
```

---

### 🐍 Example Usage (Python)

```python
import requests

def convert_with_markitdown(file_path):
    with open(file_path, 'rb') as file:
        response = requests.post('http://localhost:8080/api/markitdown', files={'file': file})
        if response.status_code == 200:
            data = response.json()
            return data['text']
        else:
            raise Exception(f"Conversion failed: {response.status_code} - {response.text}")

# Usage
text = convert_with_markitdown('example.pdf')
print(text)
```

---

### 🌐 Example Usage (JavaScript)

```html
<input type="file" id="fileInput" />

<script>
async function convertToMarkdown(file) {
    const formData = new FormData();
    formData.append('file', file);

    try {
        const response = await fetch('http://localhost:8080/api/markitdown', {
            method: 'POST',
            body: formData
        });

        if (!response.ok) throw new Error(`HTTP ${response.status}`);

        const data = await response.json();
        console.log("Converted Text:", data.text);

        // You can now inject data.text into a prompt or display in the UI.
    } catch (error) {
        console.error('Conversion failed:', error);
    }
}

document.getElementById('fileInput').addEventListener('change', async (e) => {
    const file = e.target.files[0];
    if (file) await convertToMarkdown(file);
});
</script>
```

---

### 💬 Integration with ChatCompletion

Once you retrieve `text` from `markitdown`, you can insert it into your LLM prompt as inline content:

```json
{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "<text_from_markitdown>" },
        { "type": "text", "text": "Answer this question using the above content: ...your question..." }
      ]
    }
  ]
}
```

Example in Python:

```python
response = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": text},
                {"type": "text", "text": "Summarize this document."}
            ]
        }
    ]
)
```

---

### 🔄 G4F File API vs MarkItDown

| Feature                        | G4F File API                        | MarkItDown API                |
|-------------------------------|-------------------------------------|-------------------------------|
| Upload Files                  | ✅ Yes                              | ✅ Yes                        |
| Web URL Downloads             | ✅ Yes via `downloads.json`         | ❌ No                         |
| SSE Progress Streaming        | ✅ Yes                              | ❌ No                         |
| Markdown/Text Output          | Raw/structured                      | Clean markdown/plain text     |
| Bucket/File Management        | ✅ Multi-file                        | ❌ Single-file only           |
| Use Case                      | Multi-step pipelines, large workflows | Quick extraction, inline usage |

---

### 📌 Summary

- Use **G4F File API** when:
  - You need to upload/download many files.
  - You want streamed SSE progress.
  - You're building a multi-step or large workflow.

- Use **MarkItDown** when:
  - You want quick markdown/plain text extraction from a single file.
  - You plan to inject the text directly into an LLM prompt.
  - You prefer a simple one-call API.

---

[Return to Documentation](README.md)